#setup
%load_ext pretty_jupyter

import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import country_converter as coco
import seaborn as sns
import matplotlib.pyplot as plt
import geopandas as gpd
import numpy as np
import plotly.express as px

from IPython.core.display import HTML
HTML("""
<style>
.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
}
</style>
""")

University of Sydney Precision Data Centre: Winter Data Analysis Challenge 2024

Data Provenance

Life Expectancy (1770 - 2021)

#load inital dataset
raw = pd.read_csv("life-expectancy.csv")

#define entity categories
continent = ['Africa', 'Asia', 'Europe', 'Americas', 'Oceania']
income_groups = ['High-income countries', 'Upper-middle-income countries', 'Middle-income countries', 'Lower-middle-income countries', 'Low-income countries', 'No income group available']
development = ['More developed regions', 'Small Island Developing States (SIDS)', 'Less developed regions', 'Less developed regions, excluding China', 'Less developed regions, excluding least developed countries', 'Least developed countries', 'Land-locked Developing Countries (LLDC)']

#create new df for continents, income level, development status
continents = raw[raw['Entity'].isin(continent) == True]
income = raw[raw['Entity'].isin(income_groups) == True]
develop = raw[raw['Entity'].isin(development) == True]

#create new df for countries
countries = raw[raw['Entity'].isin(development) == False]
countries = countries[countries['Entity'].isin(income_groups) == False]
countries = countries[countries['Entity'].isin(continent) == False]

#add continent column for countries df
converter = coco.CountryConverter()
countries['Continent'] = converter.convert(names=countries['Code'], src="ISO3", to="continent")

#find countries without continent match - all European countries
a = countries[countries['Code'].isna() == True]
a = a[['Entity', 'Code']]
grouped = a.groupby(by="Entity").sum()

#update continent column
countries['Continent'] = countries['Continent'].str.replace('not found', 'Europe')

Disease Mortality Rates (2010 - 2021)

# load dataset (https://ghdx.healthdata.org/record/ihme-data/gbd-2021-cause-specific-mortality-1990-2021)
disease = pd.read_excel("disease-mortality-rates.XLSX")

#create a subset with only mortality rate per 100,000 people
disease = disease[['location_type', 'location_name', 'cause_name', '2010 (ASMR)', '2019 (ASMR)', '2020 (ASMR)', '2021 (ASMR)']]

#convert columns to numeric
cols = ['2010 (ASMR)', '2019 (ASMR)', '2020 (ASMR)', '2021 (ASMR)']
for i in cols:
    disease[i] = disease.loc[:, i].str.replace(r"\(.*\)","", regex=True)
    disease[i] = pd.to_numeric(disease.loc[:, i])
    
#create column for change from 2010 to 2021
disease.loc[:, 'change'] = (disease.loc[:, '2021 (ASMR)'] - disease.loc[:, '2010 (ASMR)'])

#create global dataset
globe = disease[disease['location_type'] == 'Global']

#create global grouped dataset by cause
globe_group = globe[['cause_name', '2010 (ASMR)', '2019 (ASMR)', '2020 (ASMR)', '2021 (ASMR)', 'change']].groupby(by='cause_name').mean().reset_index()

#create africa dataset
a = ['North Africa and Middle East', 'Central Sub-Saharan Africa', 'Eastern Sub-Saharan Africa', 'Southern Sub-Saharan Africa', 'Western Sub-Saharan Africa']
africa = disease[(disease['location_type'] == 'Region') & (disease['location_name'].isin(a) == True)]

#create grouped africa dataset by cause
ac = africa.copy()
ac = ac[['cause_name', '2010 (ASMR)', '2019 (ASMR)', '2020 (ASMR)', '2021 (ASMR)', 'change']]
africa_group = ac.groupby(by='cause_name').mean().reset_index()

#create column of difference between african vs global average in 2021
africa_group['diff'] = (africa_group['2021 (ASMR)'] - globe_group['2021 (ASMR)'])
africa_group = africa_group.sort_values(by='diff', ascending=False)

Life Expectancy Decomposition (1990 - 2021)

#load dataset (https://cloud.ihme.washington.edu/s/6w3TkFaQw63Djnd?)
decomp = pd.read_excel("life-expectancy-decomp.XLSX")

#create africa df
af = ['Western Sub-Saharan Africa', 'North Africa and Middle East', 'Central Sub-Saharan Africa', 'Southern Sub-Saharan Africa', 'Eastern Sub-Saharan Africa']
decomp_africa = decomp[decomp['Location Name'].isin(af) == True]

#filter 1990-2021 only
decomp_africa = decomp_africa[(decomp_africa['Start Year'] == 1990) & (decomp_africa['End Year'] == 2021)]
decomp_africa = decomp_africa.drop(axis=0, index=670)

Death in Armed Conflicts (1989 - 2022)

#load df (Uppsala Conflict Data Program (2023); Natural Earth (2022) – processed by Our World in Data)
conflict = pd.read_csv("deaths-in-armed-conflicts.csv")

Healthcare Expenditure (2000 - 2021)

expense = pd.read_csv("health-expenditure.csv")

GDP

#load df
gdp = pd.read_csv("gdp.csv")
gdp = gdp.dropna()

#create country code column
converter = coco.CountryConverter()
gdp['iso_a3'] = converter.convert(names=gdp['Country '], to="ISO3")
#check which ones were missed - only regions, all country codes found
#gdp[gdp['iso_a3'] == 'not found']

Introduction

Introduction

This report aims to deepen understanding of how Life Expectancy in Africa has increased over time, compared to global trends. To begin, we will look at the Life Expectancy over time for each continent.

plt.figure(figsize=(12, 6))
sns.lineplot(data=continents, x='Year', y='Period life expectancy at birth - Sex: all - Age: 0', hue='Entity')
plt.title('Life Expectancy Over Time by Continent')
plt.xlabel('Year')
plt.ylabel('Life Expectancy')
plt.legend()
plt.show()

Life Expectancy globally seemingly flatlined from 1770 to 1870, then Europe, Oceania and the Americas begin to rapidly rise. It is important to note that data before 1950 is very sparse, so while it looks like the rise began in 1870, it could have started in the 30 years between 1870 and 1900. Regardless, the rise corresponds well with Louis Pasteur's development of Germ Theory in the 1860's, and his later developments of vaccines for Anthrax, Foul Cholera and Rabies (TODO SOURCE). It seems that these revolutionary discoveries laid groundwork for a continuous development of modern medicine, and thus improving life expectancy.

However, Asia and Africa don't begin to rise until around 1913 and 1925. In this report's exploration of Historical Context, we will examine why Africa's Life Expectancy lagged for that period of 55 years, and how it rose. We will then examine why there is currently such a gap between Africa and other continents in the Contemporary Context and Disease Analysis sections.

Historical Context

Historical Context

Up until the 1950's, Africa was largely under colonial rule where medical infrastructure was scant, and often not available to the Indigenous people. Where it was developed, it was intended for the European settlers, so the power of newly developed medicines were not distributed well within Africa. Going into the 19th century, medical infrastures expand rapidly bringing gains in Life Expectancy, but they remain to be built for European settlers and such access is still not at the standard as other continents. (1)

Following on from World War II, from 1950 to 1975, the process of decolonisation begins. For some countries (like Botswana, Cape Verde, Mali) it is a slow and steady rise in Life Expectancy, where others (like Cameroon, South Sudan, Ethiopia) fluctuate greatly due to conflict in the region.

# Filter the data for the selected countries and time period
fluctuate = countries[(countries['Entity'].isin(['Cameroon', 'South Sudan', 'Ethiopia'])) & (countries['Year'] >= 1950) & (countries['Year'] <= 1975)]
smooth = countries[(countries['Entity'].isin(['Botswana', 'Cape Verde', 'Mali'])) & (countries['Year'] >= 1950) & (countries['Year'] <= 1975)]

# Create subplots
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Plot the first line graph
sns.lineplot(data=fluctuate, x='Year', y='Period life expectancy at birth - Sex: all - Age: 0', hue='Entity', ax=axes[0])
axes[0].set_title('Life Expectancy of Cameroon, South Sudan, and Ethiopia (1950-1975)')
axes[0].set_xlabel('Year')
axes[0].set_ylabel('Life Expectancy')

# Plot the second line graph
sns.lineplot(data=smooth, x='Year', y='Period life expectancy at birth - Sex: all - Age: 0', hue='Entity', ax=axes[1])
axes[1].set_title('Life Expectancy of Botswana, Cape Verde, and Mali (1950-1975)')
axes[1].set_xlabel('Year')
axes[1].set_ylabel('Life Expectancy')

# Adjust layout and display the plot
plt.tight_layout()
plt.show()

Despite these mixed evolutions of Life Expectancy in individual countries during the era of decolonisation, we can see that it grew significantly overall. It increased by 26% over 25 years, doubling the increase of Europe, Oceania and the Americas in the same period.

# Filter data for the years 1950 and 1975
continents_1950 = continents[continents['Year'] == 1950]
continents_1975 = continents[continents['Year'] == 1975]

# Calculate the percentage change for each continent
result = []
for continent in continents['Entity'].unique():
  life_expectancy_1950 = continents_1950[continents_1950['Entity'] == continent]['Period life expectancy at birth - Sex: all - Age: 0'].values[0]
  life_expectancy_1975 = continents_1975[continents_1975['Entity'] == continent]['Period life expectancy at birth - Sex: all - Age: 0'].values[0]
  percentage_change = ((life_expectancy_1975 - life_expectancy_1950) / life_expectancy_1950) * 100
  result.append({'Continent': continent, 'Percentage Change (1950-1975)': percentage_change})

# Display the results in a DataFrame
pd.DataFrame(result).sort_values(by='Percentage Change (1950-1975)', ascending=False)
Continent Percentage Change (1950-1975)
2 Asia 34.887683
0 Africa 24.664106
1 Americas 13.775589
3 Europe 12.252144
4 Oceania 12.049237

Then in the 1980's the 'Lost Decade' begun, where quality of life decreased and the continent failed to grow economically as expected. The effects of this lasted well into the 1990's as can be seen below with a relative flattening of Life Expectancy growth throughout the period. We can also see the fluctuations throughout the decolonisation period as some countries struggle to gain their independence. (TODO: FIND SOURCE)

#plotting
plt.figure(figsize=(12, 6))
sns.lineplot(data=continents, x='Year', y='Period life expectancy at birth - Sex: all - Age: 0', hue='Entity')

#highlight the years 1980-1998 and 1950-1975
plt.axvspan(1950, 1975, color='navajowhite', alpha=0.3, label='Decolonisation')
plt.axvspan(1980, 1998, color='peachpuff', alpha=0.3, label='Lost Decade')

plt.title('Life Expectancy Over Time by Continent')
plt.xlabel('Year')
plt.ylabel('Life Expectancy')
plt.legend()
plt.show()

As would be expected, the percentage change throughout the 'Lost Decade' was reduce greatly, having only increased by 4% in the 18 year period.

# filter data for the years 1980 and 1998
continents_1980 = continents[continents['Year'] == 1980]
continents_1998 = continents[continents['Year'] == 1998]

# calculate the percentage change for each continent
result = []
for continent in continents['Entity'].unique():
  life_expectancy_1980 = continents_1980[continents_1980['Entity'] == continent]['Period life expectancy at birth - Sex: all - Age: 0'].values[0]
  life_expectancy_1998 = continents_1998[continents_1998['Entity'] == continent]['Period life expectancy at birth - Sex: all - Age: 0'].values[0]
  percentage_change = ((life_expectancy_1998 - life_expectancy_1980) / life_expectancy_1980) * 100
  result.append({'Continent': continent, 'Percentage Change (1980-1998)': percentage_change})
    
# display the results in a df
pd.DataFrame(result).sort_values(by='Percentage Change (1980-1998)', ascending=False)
Continent Percentage Change (1980-1998)
2 Asia 12.178733
1 Americas 7.723928
4 Oceania 5.986728
0 Africa 4.816649
3 Europe 3.696176

To sum, Africa's medical infrastructure was sparse for much of its colonial occupation, with an increase towards the end of the era. Then in the 1950's, the decolonisation process differed across countries, leading to fluctuations throughout the period but ultimately an impressive increase of 24% in Life Expectancy. Finally, from the 1980's to 1990's Life Expectancy flatlines and only increases 4% in 18 years as the 'Lost Decade' brings lows in quality of life.

Sources

  1. https://www.tandfonline.com/doi/full/10.1080/20780389.2023.2209284

Contemporary Context

Disease Analysis

Conclusion